智能论文笔记

The State of Sparse Training in Deep Reinforcement Learning

Laura Graesser , Utku Evci , Erich Elsen , Pablo Samuel Castro

分类：机器学习 | 人工智能

2022-06-17

近年来，稀疏神经网络的使用迅速增长，尤其是在计算机视觉中。它们的吸引力在很大程度上源于培训和存储所需的参数数量以及学习效率的提高。有些令人惊讶的是，很少有努力探索他们在深度强化学习中的使用（DRL）。在这项工作中，我们进行了系统的调查，以在各种DRL代理和环境上应用许多现有的稀疏培训技术。我们的结果证实了计算机视觉域中稀疏训练的发现 - 稀疏网络在DRL域中对相同的参数计数的稀疏网络表现更好。我们提供了有关DRL中各种组件如何受到稀疏网络的影响的详细分析，并通过建议有希望的途径提高稀疏训练方法的有效性以及推进其在DRL中的使用来结论。

translated by 谷歌翻译

Step-unrolled Denoising Autoencoders for Text Generation

Nikolay Savinov , Junyoung Chung , Mikolaj Binkowski , Erich Elsen , Aaron van den Oord

分类：自然语言处理 | 机器学习

2021-12-13

在本文中，我们提出了一种新的生成模型，逐步逐步的去噪AutoEncoder（Sundae），不依赖于自回归模型。类似地与去噪扩散技术，在从随机输入开始并从随机输入开始并每次直到收敛改善它们时，日出施加Sundae。我们提出了一个简单的新改进运算符，它比扩散方法更少迭代，同时在定性地在自然语言数据集上产生更好的样本。Sundae在WMT'14英语到德语翻译任务上实现最先进的结果（非自回归方法），在巨大清洁的常见爬网数据集和Python代码的数据集上对无条件语言建模的良好定性结果来自GitHub。通过在模板中填充任意空白模式，Sundae的非自动增加性质开辟了超出左右提示的可能性。

translated by 谷歌翻译

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud , Arthur Mensch , Jordan Hoffmann , Trevor Cai , Eliza Rutherford , Katie Millican , George van den Driessche , Jean-Baptiste Lespiau , Bogdan Damoc , Aidan Clark

分类：自然语言处理 | 机器学习

2021-12-08

我们通过与与前面令牌的局部相似度，通过调节从大语料库检索的文档块来增强自动回归语言模型。尽管使用25美元\时分，我们的检索增强型变压器（RetroCro）的检索增强型变压器（RetroCr）对GPT-3和侏罗纪-1获得了可比性的性能。微调后，复古表演转换为下游知识密集型任务，如问题应答。复古结合了冷冻BERT猎犬，一种可微分的编码器和块状的横向机制，以预测基于数量级的令牌，而不是训练期间通常消耗的数量。我们通常从头开始训练复古，还可以快速改造预先接受的变压器，通过检索，仍然达到良好的性能。我们的工作通过以前所未有的规模开辟了通过显式内存改进语言模型的新途径。

translated by 谷歌翻译

Rigging the Lottery: Making All Tickets Winners

Utku Evci , Trevor Gale , Jacob Menick , Pablo Samuel Castro , Erich Elsen

分类：

2019-11-25

Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-tosparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static * .

translated by 谷歌翻译

Mixed Precision Training

Paulius Micikevicius , Sharan Narang , Jonah Alben , Gregory Diamos , Erich Elsen , David Garcia , Boris Ginsburg , Michael Houston , Oleksii Kuchaiev , Ganesh Venkatesh

分类：

2017-10-10

Increasing the size of a neural network typically improves accuracy but also increases the memory and compute requirements for training the model. We introduce methodology for training deep neural networks using half-precision floating point numbers, without losing model accuracy or having to modify hyperparameters. This nearly halves memory requirements and, on recent GPUs, speeds up arithmetic. Weights, activations, and gradients are stored in IEEE halfprecision format. Since this format has a narrower range than single-precision we propose three techniques for preventing the loss of critical information. Firstly, we recommend maintaining a single-precision copy of weights that accumulates the gradients after each optimizer step (this copy is rounded to half-precision for the forward-and back-propagation). Secondly, we propose loss-scaling to preserve gradient values with small magnitudes. Thirdly, we use half-precision arithmetic that accumulates into single-precision outputs, which are converted to halfprecision before storing to memory. We demonstrate that the proposed methodology works across a wide variety of tasks and modern large scale (exceeding 100 million parameters) model architectures, trained on large datasets.

translated by 谷歌翻译

LOSDD: Leave-Out Support Vector Data Description for Outlier Detection

Daniel Boiar , Thomas Liebig , Erich Schubert

分类：机器学习 | (统计)机器学习

2022-12-27

Support Vector Machines have been successfully used for one-class classification (OCSVM, SVDD) when trained on clean data, but they work much worse on dirty data: outliers present in the training data tend to become support vectors, and are hence considered "normal". In this article, we improve the effectiveness to detect outliers in dirty training data with a leave-out strategy: by temporarily omitting one candidate at a time, this point can be judged using the remaining data only. We show that this is more effective at scoring the outlierness of points than using the slack term of existing SVM-based approaches. Identified outliers can then be removed from the data, such that outliers hidden by other outliers can be identified, to reduce the problem of masking. Naively, this approach would require training N individual SVMs (and training $O(N^2)$ SVMs when iteratively removing the worst outliers one at a time), which is prohibitively expensive. We will discuss that only support vectors need to be considered in each step and that by reusing SVM parameters and weights, this incremental retraining can be accelerated substantially. By removing candidates in batches, we can further improve the processing time, although it obviously remains more costly than training a single SVM.

translated by 谷歌翻译

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

Erich Schubert

分类： (统计)机器学习 | 机器学习

2022-12-23

A major challenge when using k-means clustering often is how to choose the parameter k, the number of clusters. In this letter, we want to point out that it is very easy to draw poor conclusions from a common heuristic, the "elbow method". Better alternatives have been known in literature for a long time, and we want to draw attention to some of these easy to use options, that often perform better. This letter is a call to stop using the elbow method altogether, because it severely lacks theoretic support, and we want to encourage educators to discuss the problems of the method -- if introducing it in class at all -- and teach alternatives instead, while researchers and reviewers should reject conclusions drawn from the elbow method.

translated by 谷歌翻译

Comparison of Data Representations and Machine Learning Architectures for User Identification on Arbitrary Motion Sequences

Christian Schell , Andreas Hotho , Marc Erich Latoschik

分类：机器学习

2022-10-02

Reliable and robust user identification and authentication are important and often necessary requirements for many digital services. It becomes paramount in social virtual reality (VR) to ensure trust, specifically in digital encounters with lifelike realistic-looking avatars as faithful replications of real persons. Recent research has shown that the movements of users in extended reality (XR) systems carry user-specific information and can thus be used to verify their identities. This article compares three different potential encodings of the motion data from head and hands (scene-relative, body-relative, and body-relative velocities), and the performances of five different machine learning architectures (random forest, multi-layer perceptron, fully recurrent neural network, long-short term memory, gated recurrent unit). We use the publicly available dataset "Talking with Hands" and publish all code to allow reproducibility and to provide baselines for future work. After hyperparameter optimization, the combination of a long-short term memory architecture and body-relative data outperformed competing combinations: the model correctly identifies any of the 34 subjects with an accuracy of 100% within 150 seconds. Altogether, our approach provides an effective foundation for behaviometric-based identification and authentication to guide researchers and practitioners. Data and code are published under https://go.uniwue.de/58w1r.

translated by 谷歌翻译

Clustering by Direct Optimization of the Medoid Silhouette

Lars Lenssen , Erich Schubert

分类：机器学习 | (统计)机器学习

2022-09-26

聚类结果的评估很困难，高度依赖于评估的数据集和情人的观点。有许多不同的聚类质量度量，试图提供一般度量以验证聚类结果。一个非常流行的措施是轮廓。我们讨论轮廓的有效基于MEDOI的变体，对其性质进行理论分析，并为直接优化提供两个快速版本。我们将原始轮廓中的想法与著名的PAM算法及其最新改进的想法相结合。其中一个版本保证了与原始变体相等的结果，并提供了$ O（k^2）$的运行加速。在有关30000个样品和$ k $ = 100的真实数据实验中，我们观察到10464 $ \ times $速度与原始的Pammedsil算法相比。

translated by 谷歌翻译

On Projections to Linear Subspaces

Erik Thordsen , Erich Schubert

分类：机器学习 | (统计)机器学习

2022-09-26

将数据投射到线性子空间上的优点是从缩小尺寸降低中众所周知的。已经对子空间预测的最大保留（主要组件分析）的一个关键方面进行了彻底研究，并且随机线性投影对诸如固有维度之类的措施的影响仍然是一项持续的努力。在本文中，我们研究了较少探索的线性投影深度，这些尺寸的显式子空间以及随之而来的方差期望。结果是欧几里得距离和内部产品的新界限。我们展示了这些边界的质量，并研究了与内在维度估计的紧密关系。

translated by 谷歌翻译